Using mutual information to select event-related components in ICA 
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1. Introduction 

Independent Component Analysis (ICA) is a 
method for decomposing spatio-temporal data into 
a set of statistically independent components 
through the application of a data-dependent 
un-mixing matrix. ICA is a potentially important 
tool for removing physiological and environmental 
noise from EEG and MEG (E/MEG) signals [1],[4]. 
Underlying this application is the reasonable 
assumption that noise unrelated to the task is inde¬ 
pendent of the event-related signals, and hence the 
two can be separated by ICA. ICA can also be used 
for identifying multiple event-related components 
in evoked response and cognitive studies [2]. In this 
case, there is often an implicit assumption that the 
event-related components are mutually statistically 
independent [4], An alternative interpretation of the 
event-related components identified by ICA is to 
view them as a decomposition of the underlying 
(possibly dependent) components in the data analo¬ 
gous to the orthogonal components identified using 
a PCA (principal components analysis), or equiva¬ 
lently a S VD (singular value decomposition), of the 
data. If ICA is to be used as part of a source-local¬ 
ization procedure, we can use the spatial vectors 
associated with each of the identified event-related 
temporal components to construct a signal sub¬ 
space. This follows directly by analogy to PCA, 
where the spatial components are used to construct 
a signal subspace from which focal current source 
locations are found using a MUSIC algorithm [3]. 

The attraction of using ICA methods for signal 
subspace identification is that they can be applied 
directly to raw (unaveraged) E/MEG data and do 
not require that the noise components in the data be 
spatially uncorrelated. Conversely, SVD-based 
identification of the signal subspace assumes that 
any noise in the data is spatially white [3]. 

Here we investigate the utility of ICA for esti¬ 
mating the signal subspace and compare its perfor¬ 
mance with that of the conventional SVD based 
methods using both raw and averaged data. 


In using ICA for signal subspace selection, we 
must identify which components are event related. 
Typically, these are identified by visual inspection 

[1] ,[2],[4], Since raw E/MEG data are often very 
noisy, it can be difficult to discern which compo¬ 
nents should be selected. Selecting based on power 

[2] does not ensure the components are 
event-related. Simply using the correlation between 
each estimated component and the event trigger is 
critically dependent on the shape of the signals. 

We propose here a method to select those inde¬ 
pendent components that are related to the task 
using a Mutual Information metric. The approach is 
invariant under motontic transformations of the sig¬ 
nal values and allows detection of task related com¬ 
ponents in the presence of large amounts of noise. 

2. Methods 

2.1. Component selection using ICA and 
mutual information 


Event-related MEG data can be represented by 
a spatio-temporal data vector X(t) formed by multi¬ 
plying a mixing matrix A = q t ... a M ^ by a vector 
of signal components: 


m = ... a u 


pft) 

PmW 


where pft) is the time course of the i th compo¬ 
nent. N(t) represents additional non-event related 
noise in the data scaled by a standard deviation a. 

Our goal here is estimation of the true signal 
subspace for the event related signals, i.e. the sub¬ 
space spanned by the columns of the mixing matrix 
A = [aj ... a M J in (1). Accurate source localization 
requires accurate estimation of the signal subspace. 
We can therefore use a principal angle metric [3] on 
the distance between the true and estimated signal 
subspaces as an indication of the relative abilities of 
different approaches to select components from 
which sources can be accurately localized. 
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Performing ICA on the spatio-temporal X(t ) 
yields a spatial un-mixing matrix B and a collection 
of independent temporal components Pit) : 

Ht) = BX(t) (2) 

Under the idealized model of statistically indepen¬ 
dent components, a = 0, and exact identification 
of the unmixing matrix B we have: 

m = B~ X P{t ) = AP{t) (3) 

The un-mixing matrix B is an NxN matrix 
where N is the number of sensors. Our goal is to 
select, the columns of A = B X to construct the best 
estimate of the signal subspace. 

To select the columns of A we propose using a 
“Mutual Information Spectrum” computed as the 
mutual information (MI) between each of the iden¬ 
tified components and a signal s(t) , which is a 
binary time series that records the onset or trigger 
for each event, i.e. 

I { = Hp t (t).s(t)*k(t)) i = 1, ...M (4) 

where k(t) is an expansion kernel that produces a 
monotonic variation in s(t) • k(t) over the 
inter-stimulus period, and • denotes a convolution. 
In the following we used a ramp function as the 
expansion kernel. 

The MI for two random variables x and y with 
joint pdf p(x,y) is defined as 

I(x,y) = \\p{x,y)[\og P^^ dxdy (5) 

= H(x)-H(x\y) = H(y)-H(y\x) 

where H(z) denotes the entropy of z. We used a 
simple scaled histogram method to compute the MI 
as the difference of estimated marginal and condi¬ 
tional entropies. We then define the MI spectrum as 
the rank ordered elements f with /. >I i+l . 

To construct the signal subspace estimate we 
then chose the corresponding spatial vectors from 
A in the order indicated by the MI spectrum. 

2.2. Component selection using SVD 

The more conventional method for estimating 
the signal subspace is to compute a SVD of the 
stimlock averaged data and use the spatial compo¬ 
nents corresponding to the largest singular values. 
This method has the potential to obscure 
event-related components in the raw data with large 
latency variations. We therefore also investigated 
selection of the signal subspace by computing a 
SVD directly from the unaveraged data. In this 
case, spatially correlated noise in the data which is 
independent of the event may produce large singu¬ 
lar values. Consequently subspace selection based 


on the singular value may perform poorly. Instead 
we use the MI Spectrum to select the signal sub¬ 
space from the components in the SVD computed 
from the unaveraged data. 

3. Simulation Studies 


3.1. Modeling multiple sources with trial-to- 
trial variations in latency and amplitude 


To investigate the performance of the ICA and 
SVD based methods for estimating the signal space 
we used the kernel-based model of evoked poten¬ 
tials described by Lange et al.[5] to simulate multi¬ 
ple-sensor multiple-epoch MEG data. This model 
includes random trial-to-trial variations in the 
latency and amplitude of the signal components. 

The model is based on the decomposition of an 
evoked response p(t) into a superposition of Gauss¬ 
ian kernels with varying amplitudes and delays. 
The model is based on the assumption that an 
event-related signal is produced as a a net sum of 
activations of several neuronal assemblies with dif¬ 
ferent intensities c . and propagation delays x. with 
respect to the event trigger. Formally this can be 
expressed as 


pit) = ^c.n^-x.) 


Pit) ex p 


u.(t) = - 


(t-x,-)' 


E H 
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Fig. 1 shows an example of how a realistic event 
related signal can be produced as a sum of delayed 
Gaussian kernels vft). 

To simulate trial-to-trial variation of the 
responses, random variables (3 ; , 0- are included in 
the model to simulate, respectively, amplitude and 



Figure 1: Example of the decomposition of an 
event related signal into a sum of delayed Gaussian 
kernels. 










latency variability. The latency 0. was modeled as 
a Gaussian random variable with mean equal to half 
of the mean variance of peak latency as reported in 
the literature [6],[7]. The amplitude (3- was Gauss¬ 
ian with mean unity and variance 0.15. The signal 
model was then: 

Pit) = ^.c.u^-T.-e.) (8) 

To produce realistic signals pft) , we used param¬ 
eters in (6) obtained by fitting the model to time 
courses estimated from real data. These time 
courses were computed for three sources localized 
from averaged somatosensory MEG data using the 
MUSIC algorithm [3]. The spatial vectors a v ..a M 
for these sources were chosen as the topographies at 
the sensor array estimated for each of these sources 
using the MUSIC algorithm. We added brain noise 
N{t) recorded from a human subject at rest to the 
event related signals according to (1). 

3.2. Results 

We generated data with 50 task repetitions 
according to (1) for a 64 sensor whole-head MEG 
system. These data contained M=2 event-related 
components. To investigate sensitivity to back¬ 
ground noise, we repeated our studies for several 
SNRs by varying the value of the parameter 6. We 
computed the signal subspace estimates for each 
data set using ICA with MI based component selec¬ 
tion, SVD with averaged data (SVD-av) and SVD 
with unaveraged data (SVD-raw). Here we used the 
SOBI ICA algorithm described in [8]. 



Figure 2: Ml spectrums for the three decomposi¬ 
tions compared to the singular value spectrum of 
SVD-av. (SVD-av. SV) for the data with 2 sources. 
-10 dB SNR. 

In Fig. 2 we show the computed MI spectrums 
for one 50-epoch data set with a low SNR. Here 
ICA finds only a single component with strong MI 
with the event trigger s(t). SVD-av finds two com¬ 
ponents with strong MI with s(t). SVD-raw also 



Figure 3: Accuracy of signal subspace estimation 
vs. SNR for the three methods measured as the 
cosine of the 1st principle angle. 

shows two strong components, but there is no clear 
threshold between these and subsequent compo¬ 
nents that we know should not be event related 
since there were only two such components in the 
simulated data. 

In Fig. 3 we show the cosine of the first princi¬ 
pal angle between the true and estimated signal sub¬ 
spaces for each of the methods. Ideally the cosine 
should equal unity. In this case, the best perfor¬ 
mance is delivered by the SVD-av technique. As 
the SNR increases ICA performance improves to 
approximately match that of SVD-av. The 
SVD-raw method performs poorly for all SNRs. 

Since there are two event-related components 
in this simulation, the second principal angle 
between true and estimated subspaces should also 
have a cosine of unity. These are shown in Fig. 4. In 
this case, SVD-av clearly outperforms the other 
methods. For the case of a very few repetitions (—10 
or less) we found that ICA delivered slightly better 
results than did SVD-av but in neither case were 
they very close to the true signal subspace. 



Figure 4: Accuracy of signal subspace estimation 
vs. SNR for the three methods measured as the 
cosine of the 2nd principle angle. 



















Figure 5: MI spectrums of ICA and SVD-av. and 
singular value spectrum of SVD-av for finger-flex¬ 
ion study. 


4. Application to somatosensory data 

We also applied the ICA and SVD-av methods 
for signal subspace estimation to human MEG data 
collected during a self-paced finger flexion study. 

We computed the MI Spectrum of ICA and 
compared it to the MI Spectrum computed from the 
projection of the raw data onto the left singular vec¬ 
tors computed using SVD-av. As shown in Fig. 5, 
the MI spectrum for both ICA and SVD-av have 
two components with relatively strong mutual 
information. A comparison of the cosines of the 
first 10 principal angles between the SVD and ICA 
estimated subspaces is shown in Fig. 6. This 
reflects a strong correspondence between the sub¬ 
spaces for the first two components in the SVD-av 
and ICA analyses with the remaining components 
showing little similarity. 

5. Discussion 

The Mutual information Spectrum is a straight¬ 
forward and apparently reasonable approach to 
selecting event-related components in an ICA 
decomposition of raw E/MEG data. The role of the 



Figure 6: 10 principle angles between ICA and 
SVD-av. estimated subspaces for finger flexion 
data. 


MI Spectrum in ICA can be viewed as analogous to 
that of the singular value spectrum in a SVD or 
PCA decomposition. Our comparison of ICA and 
SVD based signal subspace identification indicates 
that for repetitive responses, SVD decomposition 
of averaged data appears more reliable than ICA for 
estimating a signal subspace. We found this to be 
the case even for very large latency variations caus¬ 
ing significant suppression of the event related 
components in the averaged data. The significance 
of our findings are clearly dependent on the degree 
to which our model for raw data reflects true 
trial-to-trial variations in event related activity. This 
point is emphasized in the results presented for a 
finger flexion study that show a smaller difference 
between ICA and SVD-av than do the simulations. 
Further studies of real and simulated data are 
needed to evaluate the efficacy of both approaches 
to identifying the spatial components in E/MEG 
data for localization of neural current sources. 
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